Quickstart#

Welcome to AudibleLight!#

This tutorial walks through the data generation and synethesis process end-to-end.

We’ll do the following:

  1. Create a basic Scene

  2. Add a tetrahedral microphone to the Scene

  3. Add some basic static sound Events

  4. Add some background Ambience

  5. Add some more advanced Events, including moving events and events with augmentations

  6. Render the whole scene to a first-order ambisonics audio file and metadata JSON file

For more information on any of these steps, you can check out the API documentation or the other tutorial files.

Import dependencies#

We need a few basic Python dependencies for this notebook. Note that audiblelight.utils contains basic utility functions that will come in handy when working with this package.

[1]:
import os
from pathlib import Path

from scipy import stats

from audiblelight import utils

Import Scene from audiblelight.core#

In this notebook, we’ll mostly be working with the Scene object. We should import it now.

The Scene is the highest level object within the AudibleLight API. It manages the soundscape and any listeners or events added to it, and is used to synthesise the entire audio file and metadata.

[2]:
from audiblelight.core import Scene

A note on backends#

Scene supports multiple backend types (which inherit from audiblelight.state.WorldState):

  • Ray-traced RIRs, using rlr-audio-propagation (backend="rlr")

  • Measured RIRs, reading from .sofa files in a manner similar to spatialscaper (backend="sofa")

  • Parametric (shoebox) RIRs, defined in a similar manner to pyroomacoustics

The underlying API is the same regardless of backend, however, making it easy to create complex datasets that work with different types of room impulse responses.

Set default values#

All of these values can (and should!) be changed in order to experiment with the functionality of AudibleLight.

[3]:
# OUTPUT DIRECTORY
OUTFOLDER = utils.get_project_root() / 'spatial_scenes'
if not os.path.isdir(OUTFOLDER):
    os.makedirs(OUTFOLDER)
[4]:
# PATHS
FG_FOLDER = utils.get_project_root() / "tests/test_resources/soundevents"
MESH_PATH = utils.get_project_root() / "tests/test_resources/meshes/Oyens.glb"
NOISE_TYPE = "white"
[5]:
# SCENE SETTINGS
DURATION = 30.0  # seconds
MIC_ARRAY_NAME = 'ambeovr'    # could also be "eigenmike32"...
MAX_OVERLAP = 3   # maximum number of temporally overlapping sound-events

MICROPHONE_POSITION = [2.5, -1.0, 1.0]  # inside the living room
[6]:
# SCENE-WIDE DISTRIBUTIONS
MIN_VELOCITY, MAX_VELOCITY = 0.5, 1.5    # meters per second
MIN_SNR, MAX_SNR = 2, 8
MIN_RESOLUTION, MAX_RESOLUTION = 0.25, 2.0    # Hz/IRs per second
REF_DB = -50    # noise floor
[7]:
# These can be changed at will
N_STATIC_EVENTS = 4
N_MOVING_EVENTS = 1

Make a Scene!#

Now, we’re ready to create a Scene object with the parameters below.

By default, our Scene has the following properties:

  • A duration of 30 seconds

  • No more than 3 overlapping sound events at any one time

  • A noise floor level of -50 dB

  • Moving events at between 0.5 and 1.5 meters per second

  • Moving events with between 0.25 and 2.0 IRs per second

  • Events with maximum peaks at between 2-8 dB vs the noise floor

For this example, we’ll use backend="rlr".

[8]:
# This function simply returns a fresh `Scene` object with the parameters set in the cells above
def create_scene() -> Scene:
    return Scene(
        duration=DURATION,
        sample_rate=44100,
        backend="rlr",
        backend_kwargs=dict(
            mesh=utils.get_project_root() / "tests/test_resources/meshes/Oyens.glb"
        ),
        scene_start_dist=stats.uniform(0.0, DURATION - 1),
        event_start_dist=None,
        event_duration_dist=None,
        event_velocity_dist=stats.uniform(MIN_VELOCITY, MAX_VELOCITY),
        event_resolution_dist=stats.uniform(MIN_RESOLUTION, MAX_RESOLUTION),
        snr_dist=stats.uniform(MIN_SNR, MAX_SNR),
        fg_path=Path(FG_FOLDER),
        max_overlap=MAX_OVERLAP,
        ref_db=REF_DB
    )
[9]:
# Create a fresh scene object
scene = create_scene()
CreateContext: Context created

Now, we can visualise the Scene. The resulting object is interactive: try giving it a spin!

[10]:
out = scene.state.create_scene()
out.show()
[10]:

Add a listener#

Now, we’ll add a microphone to our mesh.

In AudibleLight, microphones are represented as subclasses of the audiblelight.micarrays.MicArray dataclass. A variety of standard microphone array geometries are included by default, or you can subclass this dataclass and create your own.

For now, we can use scene.add_microphone to create a tetrahedral microphone inside the living room of our mesh.

The output of this microphone will be in Ambisonics A-Format (sometimes referred to as “MIC”). To work with B-Format directly, we can use the FOAListener object, which will output first-order Ambisonics audio.

[11]:
# Add the microphone type we want, at the desired position
scene.add_microphone(microphone_type=MIC_ARRAY_NAME, alias=MIC_ARRAY_NAME, position=MICROPHONE_POSITION)
CreateContext: Context created
Warning: initializing context twice. Will destroy old context and create a new one.
[12]:
# Print some information about the microphone
scene.get_microphone(alias=MIC_ARRAY_NAME)
[12]:
{
    "name": "ambeovr",
    "micarray_type": "AmbeoVR",
    "is_spherical": true,
    "channel_layout_type": "mic",
    "n_capsules": 4,
    "capsule_names": [
        "FLU",
        "FRD",
        "BLD",
        "BRU"
    ],
    "coordinates_absolute": [
        [
            2.5057922796533956,
            -0.9942077203466043,
            1.0057357643635105
        ],
        [
            2.5057922796533956,
            -1.0057922796533958,
            0.9942642356364896
        ],
        [
            2.4942077203466044,
            -0.9942077203466043,
            0.9942642356364896
        ],
        [
            2.4942077203466044,
            -1.0057922796533958,
            1.0057357643635105
        ]
    ],
    "coordinates_center": [
        2.5,
        -1.0,
        1.0
    ]
}

Add some sound sources#

Now, we’re ready to add some sound sources.

In AudibleLight, sound sources are represented by audiblelight.event.Event objects. Each Event is associated with one or more audiblelight.worldstate.Emitter objects, which dictate the position of the Event inside the mesh at a single point in time.

For a static sound source, an Event has one Emitter. For a moving sound source, an Event has multiple Emitters, depending on its velocity and resolution.

Note that Emitter objects should never be created directly. Instead, when we create an Event, we’ll automatically create the Emitter objects that it needs.

For now, we’ll just add in a small number of static Event objects with random positions and audio files.

[13]:
# Add the correct number of static sources
scene.clear_events()
for _ in range(N_STATIC_EVENTS):
    scene.add_event(event_type="static")
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
2025-10-07 15:01:55.494 | INFO     | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event000', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/doorCupboard/35632.wav' (unloaded, 0 augmentations), 1 emitter(s).
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
2025-10-07 15:01:55.970 | INFO     | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event001', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/waterTap/205695.wav' (unloaded, 0 augmentations), 1 emitter(s).
CreateContext: Context created
Warning: initializing context twice. Will destroy old context and create a new one.
2025-10-07 15:01:56.436 | INFO     | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event002', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/maleSpeech/93899.wav' (unloaded, 0 augmentations), 1 emitter(s).
Warning: initializing context twice. Will destroy old context and create a new one.
CreateContext: Context created
2025-10-07 15:01:56.944 | INFO     | audiblelight.core:add_event:830 - Event added successfully: Static 'Event' with alias 'event003', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/laughter/9547.wav' (unloaded, 0 augmentations), 1 emitter(s).

Add background noise#

In AudibleLight, Ambience objects capture non-moving, non-spatialised sound, that is not associated with a particular spatial position. Adding this type of noise can be useful to train robust acoustic imaging systems.

To create Ambience, we have two choices:

  1. Pass in an audio filepath, which will be tiled to match the duration and channel count of the Scene

  2. Pass in the name of a particular noise type (e.g., white, pink)

For now, we’ll just add in white noise.

[14]:
scene.add_ambience(noise=NOISE_TYPE)

Add more advanced Event types#

AudibleLight has support for many different types of sound events, including sound events that move across a variety of trajectories, sound events placed in particular positions, and sound events with data augmentations (time-frequency domain masking, etc.). For more information, see the tutorial on adding Event objects to a Scene.

For now, we’ll just show how we can create a sound event that makes a random walk starting from a position given in polar coordinates with respect to our microphone, with distortion applied to the audio file.

[15]:
from audiblelight.augmentation import Distortion

moving_event = scene.add_event(
    event_type="moving",
    alias="telephone",
    filepath=FG_FOLDER / "telephone/30085.wav",
    polar=True,
    position=[0.0, 90.0, 1.0],
    shape="linear",
    scene_start=5.0,    # start five seconds in to the scene
    spatial_resolution=1.5,
    spatial_velocity=1.0,
    duration=2,
    augmentations=Distortion
)
Warning: initializing context twice. Will destroy old context and create a new one.
2025-10-07 15:02:03.334 | INFO     | audiblelight.core:add_event:830 - Event added successfully: Moving 'Event' with alias 'telephone', audio file '/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/telephone/30085.wav' (unloaded, 1 augmentations), 4 emitter(s).
CreateContext: Context created

We can also take a listen to our audio file (note that this has not been spatialised yet, so only the distortion will be audible)

[16]:
from IPython.display import Audio

Audio(moving_event.load_audio(), rate=scene.sample_rate)
[16]:

Synthesise the audio and metadata#

As a recap, we have done the following:

  1. Created a Scene object from a mesh

  2. Added multiple static Event objects at random positions

  3. Added background white noise Ambience

  4. Added a single moving Event with distortion applied, that makes a random walk from a given position

We can now generate the spatial audio and metadata by calling Scene.generate and providing output paths to save the wav and json files.

[18]:
# Do the generation!
scene.generate(
    audio_fname=str(OUTFOLDER / "audio_out_random.wav"),
    metadata_fname=str(OUTFOLDER / "metadata_out_random.json"),
)
Warning: initializing context twice. Will destroy old context and create a new one.
2025-10-07 15:02:20.693 | INFO     | audiblelight.worldstate:simulate:1685 - Starting simulation with 8 emitters, 1 microphones
2025-10-07 15:02:49.395 | INFO     | audiblelight.worldstate:simulate:1693 - Finished simulation! Overall indirect ray efficiency: 0.997
CreateContext: Context created
2025-10-07 15:02:54.774 | INFO     | audiblelight.synthesize:render_audio_for_all_scene_events:571 - Rendered scene audio in 4.70 seconds!

The audio file and metadata should now be accessible inside our output folder.

[19]:
# Pretty print the metadata JSON
print(repr(scene))
{
    "audiblelight_version": "0.1.0",
    "rlr_audio_propagation_version": "0.0.1",
    "creation_time": "2025-08-20_12:07:50",
    "duration": 30.0,
    "ref_db": -50,
    "max_overlap": 3,
    "fg_path": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents",
    "ambience": {
        "ambience000": {
            "alias": "ambience000",
            "beta": 0,
            "filepath": null,
            "channels": 4,
            "sample_rate": 44100,
            "duration": 30.0,
            "ref_db": -50,
            "noise_kwargs": {}
        }
    },
    "events": {
        "event000": {
            "alias": "event000",
            "filename": "236657.wav",
            "filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/femaleSpeech/236657.wav",
            "class_id": null,
            "class_label": null,
            "is_moving": false,
            "scene_start": 24.235181686117272,
            "scene_end": 24.653979872058315,
            "event_start": 0.0,
            "event_end": 0.41879818594104307,
            "duration": 0.41879818594104307,
            "snr": 6.748027382626969,
            "sample_rate": 44100.0,
            "spatial_resolution": null,
            "spatial_velocity": null,
            "num_emitters": 1,
            "emitters": [
                [
                    4.35991043396444,
                    -3.044263530204878,
                    0.5891032136364933
                ]
            ],
            "emitters_relative": {
                "ambeovr": [
                    [
                        312.29653071540775,
                        -8.456447367319983,
                        2.7941217533134375
                    ]
                ]
            },
            "augmentations": []
        },
        "event001": {
            "alias": "event001",
            "filename": "205695.wav",
            "filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/waterTap/205695.wav",
            "class_id": null,
            "class_label": null,
            "is_moving": false,
            "scene_start": 3.4514953647255586,
            "scene_end": 9.511903527990864,
            "event_start": 0.0,
            "event_end": 6.060408163265306,
            "duration": 6.060408163265306,
            "snr": 2.723226107271114,
            "sample_rate": 44100.0,
            "spatial_resolution": null,
            "spatial_velocity": null,
            "num_emitters": 1,
            "emitters": [
                [
                    3.096275782297078,
                    -4.153302480826513,
                    1.2712634812270731
                ]
            ],
            "emitters_relative": {
                "ambeovr": [
                    [
                        280.7079487322296,
                        4.831569403526189,
                        3.2206280785567376
                    ]
                ]
            },
            "augmentations": []
        },
        "event002": {
            "alias": "event002",
            "filename": "236385.wav",
            "filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/femaleSpeech/236385.wav",
            "class_id": null,
            "class_label": null,
            "is_moving": false,
            "scene_start": 8.323269416047072,
            "scene_end": 8.715559665480178,
            "event_start": 0.0,
            "event_end": 0.3922902494331066,
            "duration": 0.3922902494331066,
            "snr": 9.022142809631786,
            "sample_rate": 44100.0,
            "spatial_resolution": null,
            "spatial_velocity": null,
            "num_emitters": 1,
            "emitters": [
                [
                    -0.3581484310736043,
                    -6.376464854662041,
                    0.4073888355057993
                ]
            ],
            "emitters_relative": {
                "ambeovr": [
                    [
                        242.00472446892127,
                        -5.558837275985748,
                        6.117726275320578
                    ]
                ]
            },
            "augmentations": []
        },
        "event003": {
            "alias": "event003",
            "filename": "242663.wav",
            "filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/femaleSpeech/242663.wav",
            "class_id": null,
            "class_label": null,
            "is_moving": false,
            "scene_start": 15.818482566614847,
            "scene_end": 16.271271682261105,
            "event_start": 0.0,
            "event_end": 0.4527891156462585,
            "duration": 0.4527891156462585,
            "snr": 6.620465733484581,
            "sample_rate": 44100.0,
            "spatial_resolution": null,
            "spatial_velocity": null,
            "num_emitters": 1,
            "emitters": [
                [
                    5.050466347179171,
                    -0.5452550625065076,
                    0.22202123889172043
                ]
            ],
            "emitters_relative": {
                "ambeovr": [
                    [
                        10.109529882311296,
                        -16.71490561353896,
                        2.704981053354163
                    ]
                ]
            },
            "augmentations": []
        },
        "telephone": {
            "alias": "telephone",
            "filename": "30085.wav",
            "filepath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/soundevents/telephone/30085.wav",
            "class_id": null,
            "class_label": null,
            "is_moving": true,
            "scene_start": 5.0,
            "scene_end": 7.0,
            "event_start": 0.0,
            "event_end": 2.0,
            "duration": 2.0,
            "snr": 4.650165135352253,
            "sample_rate": 44100.0,
            "spatial_resolution": 1.5,
            "spatial_velocity": 1.0,
            "num_emitters": 4,
            "emitters": [
                [
                    2.5,
                    -1.0,
                    2.0
                ],
                [
                    2.065109573640491,
                    -0.9013119306521276,
                    1.74037986035627
                ],
                [
                    1.6302191472809824,
                    -0.8026238613042551,
                    1.4807597207125398
                ],
                [
                    1.1953287209214736,
                    -0.7039357919563827,
                    1.2211395810688095
                ]
            ],
            "emitters_relative": {
                "ambeovr": [
                    [
                        0.0,
                        90.0,
                        1.0
                    ],
                    [
                        167.2146100280266,
                        58.93850546023989,
                        0.8643097567376731
                    ],
                    [
                        167.2146100280266,
                        28.32608625020126,
                        1.0132156635892788
                    ],
                    [
                        167.2146100280266,
                        9.385879919956016,
                        1.3559955295103967
                    ]
                ]
            },
            "augmentations": [
                {
                    "name": "Distortion",
                    "sample_rate": 44100,
                    "drive_db": 22.644900847314595
                }
            ]
        }
    },
    "state": {
        "emitters": {
            "event000": [
                [
                    4.35991043396444,
                    -3.044263530204878,
                    0.5891032136364933
                ]
            ],
            "event001": [
                [
                    3.096275782297078,
                    -4.153302480826513,
                    1.2712634812270731
                ]
            ],
            "event002": [
                [
                    -0.3581484310736043,
                    -6.376464854662041,
                    0.4073888355057993
                ]
            ],
            "event003": [
                [
                    5.050466347179171,
                    -0.5452550625065076,
                    0.22202123889172043
                ]
            ],
            "telephone": [
                [
                    2.5,
                    -1.0,
                    2.0
                ],
                [
                    2.065109573640491,
                    -0.9013119306521276,
                    1.74037986035627
                ],
                [
                    1.6302191472809824,
                    -0.8026238613042551,
                    1.4807597207125398
                ],
                [
                    1.1953287209214736,
                    -0.7039357919563827,
                    1.2211395810688095
                ]
            ]
        },
        "microphones": {
            "ambeovr": {
                "name": "ambeovr",
                "micarray_type": "AmbeoVR",
                "is_spherical": true,
                "n_capsules": 4,
                "capsule_names": [
                    "FLU",
                    "FRD",
                    "BLD",
                    "BRU"
                ],
                "coordinates_absolute": [
                    [
                        2.5057922796533956,
                        -0.9942077203466043,
                        1.0057357643635105
                    ],
                    [
                        2.5057922796533956,
                        -1.0057922796533958,
                        0.9942642356364896
                    ],
                    [
                        2.4942077203466044,
                        -0.9942077203466043,
                        0.9942642356364896
                    ],
                    [
                        2.4942077203466044,
                        -1.0057922796533958,
                        1.0057357643635105
                    ]
                ],
                "coordinates_center": [
                    2.5,
                    -1.0,
                    1.0
                ]
            }
        },
        "mesh": {
            "fname": "Oyens",
            "ftype": ".glb",
            "fpath": "/home/huw-cheston/Documents/python_projects/AudibleLight/tests/test_resources/meshes/Oyens.glb",
            "units": "meters",
            "from_gltf_primitive": false,
            "name": "defaultobject",
            "node": "defaultobject",
            "bounds": [
                [
                    -3.0433080196380615,
                    -10.448445320129395,
                    -1.1850370168685913
                ],
                [
                    5.973234176635742,
                    2.101027011871338,
                    2.4577369689941406
                ]
            ],
            "centroid": [
                1.527919030159762,
                -4.550817438070386,
                1.162934397641578
            ]
        },
        "rlr_config": {
            "diffraction": 1,
            "direct": 1,
            "direct_ray_count": 500,
            "direct_sh_order": 3,
            "frequency_bands": 4,
            "global_volume": 1.0,
            "hrtf_back": [
                0.0,
                0.0,
                1.0
            ],
            "hrtf_right": [
                1.0,
                0.0,
                0.0
            ],
            "hrtf_up": [
                0.0,
                1.0,
                0.0
            ],
            "indirect": 1,
            "indirect_ray_count": 5000,
            "indirect_ray_depth": 200,
            "indirect_sh_order": 1,
            "max_diffraction_order": 10,
            "max_ir_length": 4.0,
            "mesh_simplification": 0,
            "sample_rate": 44100.0,
            "size": 146,
            "source_ray_count": 200,
            "source_ray_depth": 10,
            "temporal_coherence": 0,
            "thread_count": 1,
            "transmission": 1,
            "unit_scale": 1.0
        },
        "empty_space_around_mic": 0.1,
        "empty_space_around_emitter": 0.2,
        "empty_space_around_surface": 0.2,
        "empty_space_around_capsule": 0.05,
        "repair_threshold": null
    }
}

Create DCASE-style metadata.#

The DCASE challenges use a special metadata format, more details about which can be found on the website.

AudibleLight can be used to generate this metadata from a Scene. In combination with the spatial audio we just generated above, that is enough to train a model like `SELDNet <sharathadavanne/seld-dcase2023>`__

[20]:
from audiblelight.synthesize import generate_dcase2024_metadata

dcase_out = generate_dcase2024_metadata(scene)
{'ambeovr':               active_class_index  source_number_index  azimuth  elevation  \
frame_number
28                             7                    0     -157          3
29                             7                    0     -157          3
30                             7                    0     -157          3
31                             7                    0     -157          3
32                             7                    0     -157          3
...                          ...                  ...      ...        ...
129                           10                    0     -100        -24
130                           10                    0     -100        -24
131                           10                    0     -100        -24
132                           10                    0     -100        -24
133                           10                    0     -100        -24

              distance
frame_number
28                 337
29                 337
30                 337
31                 337
32                 337
...                ...
129                196
130                196
131                196
132                196
133                196

[126 rows x 5 columns]}

By default, this function creates a dictionary of pandas.DataFrame objects, one for every microphone added to our scene. We can easily print just the first few frames for our AmbeoVR microphone:

[21]:
dcase_out["ambeovr"].head()
[21]:
active_class_index source_number_index azimuth elevation distance
frame_number
28 7 0 -157 3 337
29 7 0 -157 3 337
30 7 0 -157 3 337
31 7 0 -157 3 337
32 7 0 -157 3 337

For more information on what any of these columns mean, refer to the DCASE community website.

Recreating a Scene from metadata#

Finally, note that we can also re-create a Scene object from scratch, just by reloading our JSON:

[20]:
reloaded_scene = Scene.from_json(str(OUTFOLDER / "metadata_out_random.json"))
assert reloaded_scene == scene
2025-08-20 12:07:50.575 | WARNING  | audiblelight.core:from_dict:1115 - Currently, distributions cannot be loaded with `Scene.from_dict`. You will need to manually redefine these using, for instance, setattr(scene, 'event_start_dist', ...), repeating this for every distribution.
CreateContext: Context created
Material for category 'default' was not found. Using default material instead.
Material for category 'default' was not found. Using default material instead.
CreateContext: Context created

That’s the end of the quickstart guide for AudibleLight! For more information, check out the rest of the tutorials or take a look at the API documentation.